Evaluation of Fast Spoken Term Detection Using a Suffix Array

نویسندگان

  • Kouichi Katsurada
  • Shinta Sawada
  • Shigeki Teshima
  • Yurie Iribe
  • Tsuneo Nitta
چکیده

We previously proposed [1] fast spoken term detection that uses a suffix array as a data structure for searching a largescale speech documents. In this method, a keyword is divided into sub-keywords, and the phoneme sequences that contain two or more sub-keywords are output as results. Although the search is executed very quickly on a 10,000-h speech database, we only proposed a variety of matching procedures in [1]. In this paper, we compare different varieties of matching procedures in which the number of phonemes in a subkeyword and the required number of sub-keywords to be contained in a search result are different. We also compare the performance and the process time of our method with typical spoken term detection using an inverted index.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acceleration of spoken term detection using a suffix array by assigning optimal threshold values to sub-keywords

We previously proposed a fast spoken term detection method that uses a suffix array data structure for searching large-scale speech documents. The method reduces search time via techniques such as keyword division and iterative lengthening search. In this paper, we propose a statistical method of assigning different threshold values to sub-keywords to further accelerate search. Specifically, th...

متن کامل

Utilizing Confusion Network in the STD with Suffix Array and Its Evaluation on the NTCIR-11 SpokenQuery & Doc SQ-STD Task

The authors have proposed a fast spoken term detection that uses a suffix array as a data structure. This method enables very quick and memory saving search by using such techniques as keyword division, dynamic time warping, and employment of articulatoryfeature-based local distance definition. In this paper, we investigate a new approach that utilizes a confusion network in the suffix array. T...

متن کامل

Using Multiple Speech Recognition Results to Enhance STD with Suffix Array on the NTCIR-10 SpokenDoc-2 Task

We have previously proposed a fast spoken term detection method that uses a suffix array as a data structure. By applying dynamic time warping on a suffix array, we achieved very quick keyword detection from a very large-scale speech document. In this study, we modify our method so that it can deal with multiple recognition results. By using these results obtained from various speech recognizer...

متن کامل

STD Method Based on Hash Function for NTCIR11 SpokenQuery&Doc Task

In this paper, we describe a spoken term detection (STD) method which is used in Spoken Query and Documents task of NTCIR-11 meeting. Our STDmethod extracts sub-sequences from the syllable-based speech recognition candidates of the target speech and converts them into bit sequences using a hash function. The query is also converted into a bit sequence in the same way. Term detection candidates ...

متن کامل

Ultra-fast Multiple Genome Sequence Matching Using GPU

In this paper, a contrastive evaluation of massively parallel implementations of suffix tree and suffix array to accelerate genome sequence matching are proposed based on Intel Core i7 3770K quad-core and NVIDIA GeForce GTX680 GPU. Besides suffix array only held approximately 20%∼30% of the space relative to suffix tree, the coalesced binary search and tile optimization make suffix array clearl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011